feat(router): add TLS/mTLS support to gRPC subgraphs#2861
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds gRPC-specific client TLS configuration and schema, refactors TLS builders into HTTP vs gRPC paths, threads default/per-subgraph gRPC *tls.Config through the graph server into the gRPC connector, updates the gRPC provider and test env for TLS, and adds integration tests and docs for TLS/mTLS scenarios. ChangesgRPC Client TLS Configuration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Comment |
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/pkg/grpcconnector/grpcremote/grpc_remote.go (1)
71-86:⚠️ Potential issue | 🟠 Major | ⚡ Quick winProtect
Startwith the provider mutex.
Startreads/writesg.ccwithout synchronization while other lifecycle methods usemu, which can race under concurrent start/get/stop paths.Proposed fix
func (g *RemoteGRPCProvider) Start(ctx context.Context) error { + g.mu.Lock() + defer g.mu.Unlock() + if g.cc == nil { var transportCreds grpc.DialOption if g.tlsConfig != nil { transportCreds = grpc.WithTransportCredentials(credentials.NewTLS(g.tlsConfig)) } else { transportCreds = grpc.WithTransportCredentials(insecure.NewCredentials()) } clientConn, err := grpc.NewClient(g.endpoint, transportCreds) if err != nil { return fmt.Errorf("failed to create client connection: %w", err) } g.cc = clientConn } return nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@router/pkg/grpcconnector/grpcremote/grpc_remote.go` around lines 71 - 86, Start currently reads/writes g.cc without acquiring the provider mutex (mu), causing races with other lifecycle methods; modify RemoteGRPCProvider.Start to acquire the same mutex used by other methods (mu) at the start of the function, check g.cc while holding the lock, initialize g.cc if nil, and release the lock (use defer Unlock immediately after Lock) so Start is synchronized with Stop/GetClient and avoids data races on g.cc.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@router/core/tls.go`:
- Around line 71-73: The warning message logged when
sgCfg.InsecureSkipCaVerification is true is misleading (it says the subgraph
"inherits" from global config); update the logger.Warn call in tls.go (the
branch checking sgCfg.InsecureSkipCaVerification) to state that the subgraph TLS
config has InsecureSkipCaVerification enabled (or that the subgraph is
configured to skip CA verification), removing the word "inherits" and any
implication of global config so the message accurately reflects
sgCfg.InsecureSkipCaVerification and `logger.Warn` usage for the subgraph named
by `name`.
---
Outside diff comments:
In `@router/pkg/grpcconnector/grpcremote/grpc_remote.go`:
- Around line 71-86: Start currently reads/writes g.cc without acquiring the
provider mutex (mu), causing races with other lifecycle methods; modify
RemoteGRPCProvider.Start to acquire the same mutex used by other methods (mu) at
the start of the function, check g.cc while holding the lock, initialize g.cc if
nil, and release the lock (use defer Unlock immediately after Lock) so Start is
synchronized with Stop/GetClient and avoids data races on g.cc.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4e212fb2-4a36-4f65-bab3-b669318a1727
📒 Files selected for processing (14)
router-tests/security/subgraph_grpc_mtls_test.gorouter-tests/testenv/testenv.gorouter/core/graph_server.gorouter/core/router.gorouter/core/router_config.gorouter/core/supervisor_instance.gorouter/core/tls.gorouter/core/tls_test.gorouter/pkg/config/config.gorouter/pkg/config/config.schema.jsonrouter/pkg/config/fixtures/full.yamlrouter/pkg/config/testdata/config_defaults.jsonrouter/pkg/config/testdata/config_full.jsonrouter/pkg/grpcconnector/grpcremote/grpc_remote.go
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2861 +/- ##
==========================================
+ Coverage 66.24% 66.32% +0.08%
==========================================
Files 258 258
Lines 27056 27126 +70
==========================================
+ Hits 17923 17992 +69
+ Misses 7720 7719 -1
- Partials 1413 1415 +2
🚀 New features to boost your workflow:
|
They are only needed when building a grpc connector and are obsolete after that. Hence they don't need to be a type field.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@router/core/graph_server.go`:
- Around line 357-358: The graph mux reuse logic currently ignores TLS changes
because defaultClientTLS and perSubgraphTLS are only considered on rebuild;
update the reuse decision so muxes/connectors are not reused when those TLS
settings differ by including defaultClientTLS and perSubgraphTLS in the mux
identity/comparison (or cache key) used by the base/feature-flag reuse branches,
and force a rebuild/recreate of the mux/connector whenever those TLS values
change; locate the reuse checks that decide to keep an existing mux (the
base/feature-flag reuse branches) and add equality checks or incorporate the TLS
structs so stale gRPC credentials are not retained.
- Around line 1411-1420: After successfully calling setupConnector
(setupConnector), ensure partially-initialized resources are cleaned up if
subsequent buildGraphMux fails: either keep the connector/local resources
(caches, metric stores, pubsub providers) in local variables and only assign
them to the server state (s.*) after buildGraphMux completes successfully, or
add a short-lived defer immediately after setupConnector that calls
graphServer.Shutdown (or the connector-specific cleanup routine) and cancels
that defer if buildGraphMux succeeds; update the error-return paths after
buildGraphMux to invoke the cleanup so no providers or connectors remain running
on failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 526ede46-37b7-48f3-bd51-4acaaabcd3af
📒 Files selected for processing (2)
router/core/graph_server.gorouter/pkg/grpcconnector/grpcremote/grpc_remote.go
🚧 Files skipped from review as they are similar to previous changes (1)
- router/pkg/grpcconnector/grpcremote/grpc_remote.go
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs-website/router/configuration.mdx`:
- Line 463: The table entry for TLS_CLIENT_GRPC_ALL_ENABLED (key:
tls.client_grpc.all.enabled) incorrectly shows a required icon; update the icon
from a required/check state to the optional/square icon so the row reflects that
this setting is optional and has a default of false (change <Icon
icon="check-square" /> to <Icon icon="square" /> for the
TLS_CLIENT_GRPC_ALL_ENABLED row).
In `@docs-website/router/security/tls.mdx`:
- Line 270: The sentence "A per-subgraph entry fully replaces the global `all`
config for that subgraph — it does not merge with it." uses an em dash; update
this line in docs-website/router/security/tls.mdx to avoid em dashes by
splitting into two sentences or using a period, e.g. "A per-subgraph entry fully
replaces the global `all` config for that subgraph. It does not merge with it."
Ensure you keep the exact `all` code token and the phrase "per-subgraph entry"
so the meaning and reference remain unchanged.
In `@router/pkg/config/config.go`:
- Around line 944-952: The Enabled() method on GRPCClientTLSConfiguration
currently only checks the boolean Enabled flags and thus ignores populated TLS
fields set via env vars; update GRPCClientTLSConfiguration.Enabled() to also
return true if any subgraph or the All config has TLS material present
(non-empty CAFile, CertFile, KeyFile or InsecureSkipCAVerification == true) in
addition to checking v.Enabled, and apply the same change to the analogous
Enabled() method referenced around lines 959-968 (check the corresponding
struct's Subgraphs, All and their
CAFile/CertFile/KeyFile/InsecureSkipCAVerification fields) so that populated TLS
fields implicitly enable TLS rather than being silently ignored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7e3b000b-8595-4e0a-a3f2-f4eda450dc05
📒 Files selected for processing (16)
docs-website/router/configuration.mdxdocs-website/router/gRPC/concepts.mdxdocs-website/router/gRPC/grpc-services.mdxdocs-website/router/intro.mdxdocs-website/router/security/tls.mdxrouter-tests/security/subgraph_grpc_mtls_test.gorouter-tests/security/subgraph_mtls_test.gorouter/core/graph_server.gorouter/core/tls.gorouter/core/tls_test.gorouter/pkg/config/config.gorouter/pkg/config/config.schema.jsonrouter/pkg/config/config_grpc_tls_test.gorouter/pkg/config/fixtures/full.yamlrouter/pkg/config/testdata/config_defaults.jsonrouter/pkg/config/testdata/config_full.json
✅ Files skipped from review due to trivial changes (4)
- docs-website/router/gRPC/concepts.mdx
- router/pkg/config/testdata/config_full.json
- router/pkg/config/fixtures/full.yaml
- docs-website/router/intro.mdx
|
|
||
| | Environment Variable | YAML | Required | Description | Default Value | | ||
| | -------------------------------------------------- | -------------------------------------------------- | ----------------------- | -------------------------------------------------------------------------------------------------------- | ------------- | | ||
| | TLS_CLIENT_GRPC_ALL_ENABLED | tls.client_grpc.all.enabled | <Icon icon="check-square" /> | Enable TLS for all gRPC subgraph connections. | false | |
There was a problem hiding this comment.
Fix required marker for tls.client_grpc.all.enabled.
This field is shown as required, but the table also states a default of false. Mark it as optional (square) to avoid implying users must set it explicitly.
Suggested doc fix
-| TLS_CLIENT_GRPC_ALL_ENABLED | tls.client_grpc.all.enabled | <Icon icon="check-square" /> | Enable TLS for all gRPC subgraph connections. | false |
+| TLS_CLIENT_GRPC_ALL_ENABLED | tls.client_grpc.all.enabled | <Icon icon="square" /> | Enable TLS for all gRPC subgraph connections. | false |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | TLS_CLIENT_GRPC_ALL_ENABLED | tls.client_grpc.all.enabled | <Icon icon="check-square" /> | Enable TLS for all gRPC subgraph connections. | false | | |
| | TLS_CLIENT_GRPC_ALL_ENABLED | tls.client_grpc.all.enabled | <Icon icon="square" /> | Enable TLS for all gRPC subgraph connections. | false | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs-website/router/configuration.mdx` at line 463, The table entry for
TLS_CLIENT_GRPC_ALL_ENABLED (key: tls.client_grpc.all.enabled) incorrectly shows
a required icon; update the icon from a required/check state to the
optional/square icon so the row reflects that this setting is optional and has a
default of false (change <Icon icon="check-square" /> to <Icon icon="square" />
for the TLS_CLIENT_GRPC_ALL_ENABLED row).
|
|
||
| #### Per-Subgraph Configuration | ||
|
|
||
| Override the global config for specific gRPC subgraphs. A per-subgraph entry fully replaces the global `all` config for that subgraph — it does not merge with it. |
There was a problem hiding this comment.
Replace em dash in per-subgraph override note.
Use a period or split the sentence. Em dashes are disallowed in docs.
Suggested doc fix
-Override the global config for specific gRPC subgraphs. A per-subgraph entry fully replaces the global `all` config for that subgraph — it does not merge with it.
+Override the global config for specific gRPC subgraphs. A per-subgraph entry fully replaces the global `all` config for that subgraph. It does not merge with it.As per coding guidelines: "Avoid em dashes. Use periods or restructure the sentence instead."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Override the global config for specific gRPC subgraphs. A per-subgraph entry fully replaces the global `all` config for that subgraph — it does not merge with it. | |
| Override the global config for specific gRPC subgraphs. A per-subgraph entry fully replaces the global `all` config for that subgraph. It does not merge with it. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs-website/router/security/tls.mdx` at line 270, The sentence "A
per-subgraph entry fully replaces the global `all` config for that subgraph — it
does not merge with it." uses an em dash; update this line in
docs-website/router/security/tls.mdx to avoid em dashes by splitting into two
sentences or using a period, e.g. "A per-subgraph entry fully replaces the
global `all` config for that subgraph. It does not merge with it." Ensure you
keep the exact `all` code token and the phrase "per-subgraph entry" so the
meaning and reference remain unchanged.
| // Enabled returns true if any subgraph or the default settings have TLS enabled. | ||
| func (c *GRPCClientTLSConfiguration) Enabled() bool { | ||
| for _, v := range c.Subgraphs { | ||
| if v.Enabled { | ||
| return true | ||
| } | ||
| } | ||
|
|
||
| return c.All.Enabled |
There was a problem hiding this comment.
Do not ignore populated gRPC TLS settings when enabled is omitted.
Line 945 only checks the boolean flags. If an operator sets TLS_CLIENT_GRPC_ALL_CA_FILE, CERT_FILE, KEY_FILE, or INSECURE_SKIP_CA_VERIFICATION via env and forgets TLS_CLIENT_GRPC_ALL_ENABLED, this config stays silent and TLS is treated as disabled. LoadConfig validates YAML, not env-populated state, so this turns into a plaintext fallback instead of a validation error. Either derive enablement from populated TLS fields here, or add post-load validation that rejects TLS material when enabled is false.
Also applies to: 959-968
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@router/pkg/config/config.go` around lines 944 - 952, The Enabled() method on
GRPCClientTLSConfiguration currently only checks the boolean Enabled flags and
thus ignores populated TLS fields set via env vars; update
GRPCClientTLSConfiguration.Enabled() to also return true if any subgraph or the
All config has TLS material present (non-empty CAFile, CertFile, KeyFile or
InsecureSkipCAVerification == true) in addition to checking v.Enabled, and apply
the same change to the analogous Enabled() method referenced around lines
959-968 (check the corresponding struct's Subgraphs, All and their
CAFile/CertFile/KeyFile/InsecureSkipCAVerification fields) so that populated TLS
fields implicitly enable TLS rather than being silently ignored.
WIP
Summary by CodeRabbit
New Features
Tests
Documentation
Checklist
Open Source AI Manifesto
This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.